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ABSTRACT 



This paper investigates the conditions under which a 
discrete optimization problem can be formulated as a dynamic pro- 
gram. Following the terminology of (Karp and Held 1957), a 
discrete optimization problem is formalized as a discrete deci- 
sion problem and the class of dynamic programs is formalized as a 
sequential decision process. Necessary and sufficient conditions 
for the representation in two different senses of a discrete de- 
cision problem by a sequential decision process are established. 
In the first sense (a strong representation) the set of all op- 
timal solutions to the discrete optimization problem is obtain- 
able from the solution of the functional equations of dynamic 
programming. In the second sense (a weak representation) a 
nonempty subset of optimal solutions is obtainable from the solu- 
tion of the functional equations of dynamic programming. It is 
shown that the well known principle of optimality corresponds to 
a strong representation. A more general version of the principle 
of optimality is given which corresponds to a weak representation 
of a discrete decision problem by a sequential decision process. 
We also show that the class of strongly representable discrete 
decision problems is equivalent to the class of sequential deci- 
sion prcesses which have cost functions satisfying a strict mono- 
tonicity condition. Also a new derivation is given of the result 
that the class of weakly representable discrete decision problems 
is equivalent to the class of sequential decision processes which 
have a cost function satisfying a monotonicity condition. 



D 



1 . 



Introduction 



Dynamic programming has proven to he one of the principal 
methods for the formulation and solution of discrete optimization 
problems. A number of studies have explored the extent to which 
dynamic programming is applicable to such problems, including 
(Mitten 1964, Held and Karp 1967, Elmaghraby 1970, Bonzon 1970, 
Ibaraki 1972,1973, and other cited in the references). A recent 
survey of solution techniques and applications of dynamic pro- 
gramming appears in (Morin 1978). Mitten was the first to point 
out the essential role that the monotonicity of the cost function 
plays in a dynamic program. Subsequently, (Held and Karp 1967) 
studied dynamic programs in terms of a finite state machine with 
a superimposed cost structure (an sdp as defined below), and 
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tion problem is obtainable from the solution of the functional 
equations of dynamic programming. In the second sense (a weak 
representation) a nonempty subset of optimal solutions is obtain- 
able from the solution of the functional equations of dynamic 
programming. It is shown that the well known principle of 
optimality corresponds to a strong representation. A more gen- 
eral version of the principle of optimality is given which 
corresponds to a weak representation of a ddp by a sdp. It is 
shown that sdp's having a strictly monotonic cost function are in 
one to one correspondence with strong representations of ddp's. 
Finally a new derivation is given of the result that sdp's having 
a monotonic cost function are in one-to-one correspondence with 
weak representations of a ddp. 

Our notion of a weak representation is new in that we nei- 
ther require all optimal solutions nor the correct cost of the 
optimal solutions, but are satisfied with some optimal solutions. 
Presumeably if the correct costs were required, one could compute 
the cost of an optimal solution using the cost function of the 
ddp after they have been found by some method. The notion of 
strong representation was introduced, along with an even stronger 
sense of representation, in (Ibaraki 1972). 

2. Definitions. 



A discrete decision problem is intended as a general model 
of combinatorial optimization problems. A discrete decision 
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problem is a system D=(A,S,P,f) where 



A is a finite nonempty alphabet (set of primitive deci- 
sions) , 

SCA* (set of feasible policies), 

P is a set (the set of data inputs for the problem), 
f:SxP->R where R is the set of positive reals, (cost or 
objective function) . 

An instance of a discrete decision problem D, denoted D(p), 
is given by a particular data input pGP. A policy sGS is optimal 
with respect to input pGP if VtGS f (s , p) <_f ( t , p) . The set of 
optimal policies for the problem instance D(p) is denoted 0(D,p). 
We will be interested in the conditions under which the problem 
of finding 0(D,p) or a subset of 0(D,p) can be formulated by a 
dynamic program. 

One of the simplest discrete decision problems is the prob- 
lem of finding the least cost path from the start node to a goal 
node in an arc-weiqhted directed graph. This problem can be 
represented as a ddp as follows; let A be the set of arcs (i,j) 
in the graph where (i,j) represents the decision to move from 
node i to node j, S is then the set of sequences of arcs which 
move from the start node to a final node, P is the set of cost 
matrices (p^j) where Pi f j is the cost of arc (i,j), and finally 
f(s,p) is the cost of arc sequence (path) s with respect to input 

p; more precisely, f(s,p) = 51 p.- j . . 

( i , j ) Gs ' J 
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The functional equations of dynamic programming apply to a 



kind of process called a sequential decision process. A sequen- 
tial decision process ( sdp ) is a system 11= ( A , Q , qg ,Qf , T , h , k , P) 
where 

A is a finite nonempty alphabet (set of primitive deci- 
sions) , 

Q is a set (set of states) , 
q 0 €Q (start state) , 

QfCQ (set of final states), 
t:QxA->Q (transition function), 
h:RxQxAxP-»R (cost or objective function), 
k:P->R (initial cost function), 

P is a set (input data specifications). 

The transition function t applies a decision aG A to a state qGQ 
resulting in a transition to a new state t(q,a). We can extend 
the domain of t to QxA by the following recursive definition: 
let t(q,e)=q for qGQ, where e is the empty sequence, 
t (q, xa) =t (t (q , x) , a) for qGQ, xGA* , and aGA. Thus t(q,xa) is the 
state resulting from applying the decision sequence xa to the 
initial state q. When only one argument is given to t the path 
will be assumed to originate at the start state, thus t(x) is the 
state resulting from applying the decision sequence x from the 
start state. Let F (II) = { x 1 1 ( x) GQ^ } . xGF(II)is a feasible decision 
sequence which t maps (by definition) from q Q to some final state 
qfGOf. Note that the first five components of a discrete deci- 
sion problem comprise a finite state automaton (Hopcroft and Ull- 
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man 1969). The cost function h(c,q,a,p) is the cost of reaching 
state t(q,a) by a sequence reaching state q with cost c which is 
extended by decision a. The initial cost function k (p) is the 
cost of a null sequence given input p. It will be useful to con- 
sider the special case of decision sequences applied to the start 
state as follows: let g(e,p)=k(p), g(xa,p) = h (g ( x , p) , t (x) , a , p) 
for xGA*, a6A , p6P. Thus g(x,p) gives the cost of reaching state 
t(x) from q Q by means of the sequence of decisions x. Finally 
since we are interested in optimal decision sequences let us 
define (and assume the existence of) G(q s ,p)=k(p) and G(q,p) = 

min g(x,p) for all q^q , p€P, thus G(q,p) is the cost of 

{ x 1 1 ( x ) =q } s 

the least cost decision sequence reaching state q from qQ. We 

★ 

say xGA is an optimal decision sequence reach 1 ng state £ if 
t(x)=q and G (q , p) =g (x , p) . The set of optimal decision sequences 
reaching a final state of II are denoted O(IIrP). Note that O(IIrP) 
is always nonempty since there is at least one least cost 
sequence reaching each final state of II* A sdp II represents a 
ddp D if F(II)=S and 0 (XI » P ) CO (D , p) . 

3 . Representations of a d iscrete d ecision problem . 



Before turning to our primary problem of characterizing the 
representations of a ddp by a dynamic program, we give necessary 
and sufficient conditions for the representation, as defined 
above, of a ddp by an sdp. We first summarize some concepts and 
results on finite automata (Hopcroft and Ullman 1969) which will 
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be needed only in the present section. The equi response relatio n 

of a finite automaton is defined by the relation xRy iff 

* * 
t(x)=t(y) for all x,y6A . An equivalence relation R on A is 

£ 

called right invariant if xRy -» (VzGA )xzRyz. If R and T are 

* ★ 

equivalence relations on A then R ref ines T if Vx,yGA xRy -> 
xTy. An equivalence relation has finite rank if it has only a 
finite number of equivalence classes. Note that the equiresponse 
relation on a finite automaton is right invariant since t(x)=t(y) 
-» t(xz) = t(t(x),z) = t(t(y),z) = t(yz). Finally for some SCA* 
define the equivalence relation Rg as follows: 
xR s y iff ( VzGA* ) xzGS «-» yzGS. 

The following lemma gives us an essential property of finite 
automata . 

Propos i t ion 1_. Let SCA and let R be a riqht invariant 
equivalence relation of finite rank, then R is the equiresponse 
relation of a finite automaton which accepts S iff R refines R g . 

proof: see (Hopcroft and Ullman 1969; pp 29). 

Theorem .1. A sdp IT=(A,Q,q 0 ,Q f ,T,h,k,P) represents a ddp 
D= ( A , S , P , f ) iff the following conditions hold: 

1. the equivalence relation R defined by xRy iff t(x)=t(y) for 

* 

x,yGA is a right invariant equivalence relation of finite 
rank which refines Rg. 

2. (VpGP) (Ex s.t. t(x)GQ f )(Vy s.t. t(y)GQ^) g (y , p) £g (x , p) 
yGO (D , p) . 
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proof: (if): Suppose that conditions 1 and 2 hold. By proposi- 
tion 1, R is the equiresponse relation of a finite automaton 
which accepts the language S, so F (IT) =S . Let x satisfy condition 
2, so (VyeS s.t. t(y)6Q f ) g (y , p) <g (x , p) -» y€0(D,p). Let 
yeO(IlrP) so (Vy s.t. t(y)6Q f ) g (y ,p) <g (y , p) -» g (y , p) <g (x , p) -» 
y€0(D,p) thus 0 (IT/P) £0 (D,p) . 

(only if): Suppose now that IT represents D, so F (IT) =S and 
0 (IT/ P) CO (D , p) . R is the equiresponse relation of a finite auto- 
maton which accepts S, so R is a right invariant equivalence 
relation of finite rank. By proposition 1, R refines R^ , so con- 
dition 1 holds. Let y€0(II/P) then (Vy s.t. t(y)6Qjr) 
g (y,p) <g (y ,p) -» g(y,p) = g(y,p) -» yeO(II,P) -» yGO(D,p). Thus 
condition 2 holds. QED 

There are several important aspects to our representations 
of ddp's by sdp's which should be pointed out. In mapping from a 
ddp to a sdp, we assume the notion of a state (the equivalence 
classes of R in theorem 1), the existence of the transition func- 
tion t which only depends on the current state and input deci- 
sion, and a cost function which is separable in the sense that 
the cost of adding a transition onto the end of a sequence only 
depends on the current state, the input decision, and the cost of 
the sequence (in general the cost might depend on all previous 
decisions). This much structure is implicit in the concept of a 
dynamic program. A closer examination of these assumptions may 
be found in (Elmaghraby 1970). 
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4. 



Strong representations of a d i screte dec i s ion problem . 

Our purpose is to discover the conditions under which a sdp 
n represents a ddp D by means of a discrete dynamic program. The 
principal underlying dynamic programming has been formulated by 
Bellman in the Principle of Optimality (Bellman 1957) and can be 
paraphrased as follows: 

An optimal sequence has the property that no matter what the 
next-to-last state and the next-to-last decision are the sequence 
reaching the next-to-last state must be optimal. 

This version of the principle of optimality is illustrated 
in figure la. If for aGA, xGA* xa is an optimal sequence from 
state q 0 to q f then x is an optimal sequence from q Q to q. In 
general the principle of optimality implies that if xy, for 
x,yGA*, is an optimal sequence from q 0 to q f then x is an optimal 
sequence from qg to t(qg,x) and y is an optimal sequence from 
t(q 0 ,x) to q£ as illustrated in figure lb. This illustration 
applies only to discrete sequences and so should not be construed 
to demonstrate the full range of dynamic programming which is 
much broader. 

In terms of an sdp the principle of optimality can be made 
precise as follows: 

(VpGP) (VxGA*) (VaGA) G (t (xa) ,p)=g (xa ,p) -» G (t (x) ,p) =g (x , p) d) 
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The following lemma states an equivalent form for (1). Let 
n=(A,Q,q 0 ,Q f ,T,h,k,P) be a sdp. h is s 1 - monoton ic if for all 
states q€Q, optimal sequences xa reaching state q, and sequences 
ya reaching q, we have g (x , p) <g (y , p ) <-> g (xa , p) <g (ya , p) . A sdp 
containing a s'-monotonic cost function is a s 1 - monoton i c seque n- 
tial decision process (s'-msdp). We say h is strictly monoton i c 
(s-monotonic) if for all x,y€A* such that t(x)=t(y), 
g (x , p) <g (y , p) -» g (xa , p) <g (ya , p) . A sequential decision process 
which contains a s-monotonic cost function is called a strictly 
monoton i c sequential decision process (s-msdp) . 

Theorem 2^ (1) holds for an sdp 11= (A , Q , qg , , T , h , k , P) iff h is 
s '-monotonic. 

proof: (only if): Suppose that (1) holds for some sdp n and 
that h is not s'-monotonic. Let xa be an optimal sequence reach- 
ing state q and let y be a sequence such that t(x)=t(y). Suppose 
first that g ( x , p) <g (y , p) and g (xa ,p) >_g (ya ,p) . Since 
G (q,p)=g (xa ,p) >g (ya ,p) , we have g (xa , p) =g (ya , p) . By (1), 
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G (q ' ,p) =g (x ,p) =g (y ,p) , but this contradicts our assumption that 
g (x,p) <g (y,p) . Thus g (x,p) <g (y ,p) -» g (xa ,p) <g (ya ,p) . Suppose 
instead we have g (xa ,p) <g (ya ,p) but g (x,p) >g (y ,p) . g (x , p) (y , p) 
since g (xa ,p) (ya ,p) so g (x,p) >g (y,p) . But by (1) and our 

assumption that xa is an optimal sequence reaching q, we have 
G (q ' ,p) =g (x , p) £g (y ,p) by definition of G. This contradiction 

shows that g (xa ,p) <g (ya ,p) -» g (x , p) <g (y , p) when x is an optimal 
sequence reaching state q. Thus (1) -> h is s'-monotonic. 

(if): Suppose now that h is s'-monotonic. If (1) does not 

hold then for some sequence xa such that t(xa)=q, we have 

* 

G (q , p) =g (xa , p) but G (q ' ,p) ^g (x ,p) where t(q',a)=q. For some yGA 
such that t(x)=t(y) we have G (q ' , p) =g (y , p) <g (x , p) . If 

g (ya ,p) -g (xa , p) =G (q,p) then h is not s'-monotonic (with respect 
to optimal sequence xa), so we must have g (ya ,p) >g (xa ,p) . But 
since h is s'-monotonic we have g (y , p) >g (x , p) which contradicts 
our earlier finding that g (y , p) <g (x ,p) . Thus (1) must hold. QED 

In practice we wish to find optimal policies between states. 
We define below the tables T(q,p) which store the information 
necessary to obtain optimal policies. Formally for all qGQ,pGP 
T(q,p) is a subset of QxA. (T : 0xP-»2^ xA ) . A set of policies 
9(q,p) are obtainable from the tables T(q,p) as follows: let 

0(q s ,p) = {(q s ,e)}, where e is the empty string, 

9(q,p) = {ya| (q' ,a)GT(q,p) and yG0(q',p)} for q^q s . 

A ddp D=(A,S,P,f) is strongly - rep resented ( weakly - represen ted ) by 
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a sdp 11= ( A , Q , qQ , Qf , T , h , k , P) if i) n represents D, ii) the func- 
tional equations (2) and (3) given below hold and iii) for q€Q, 
p6P the set of policies obtainable from the tables T(q,p) is the 
set (subset) of all optimal policies; in particular 

U 0 (q , p) =0 (11/ p) ( U 9 (q, p) CO (H/P) for a weak representa- 
qeQf qeo f 

tion) . 

G(q s ,p)=k ( 2 ) 

G(q,p)= min h (G (q ' , p) ,q ' , a ,p) (3) 

{ ( q , a ) I t (q ,a)=q} 

T (q ,p) = { (q ' ,a) I t (q ' , a) =q, G (q , p) =h (G (q ' , p) ,q ' , a ,p) } (4) 

Note that if II strongly (weakly) represents D then by (i) 

0(II,p)=0(D,p) and thus U 0(q,p) = 0(D,p) ( U 0 (q , p) CO (D ,p) ) 

q6Q f q6Q f 

i.e., the construction of the tables 9 by means of (2), (3), and 
(4) results in the construction of all (a nonempty subset of) 
optimal solutions to the ddp D. 

Lemma 1^. x69(q,p) -» x is an optimal sequence reaching state q. 

proof; the lemma follows immediately from the stronger lemma 2 
which is given in the appendix. 

We do not require that an optimal sequence have the same cost in 
D as in II. Our interest is in obtaining optimal solutions and in 
making use of the functional equations (2) and (3). These equa- 
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tions are characteristic of dynamic programming and are often 
considered a direct translation of the principle of optimality. 
We take (1) as a more direct translation and show next that in 
the sense of a strong representation (1) and the equations (2) 
and (3) are equivalent. 

Theorem 3. A ddp D=(A,S,P,f) is strongly-represented by an sdp 
II=(A,Q,q 0 ,Q f ,T,h,k,P) iff II represents D and (1) holds. 

proof: (if): Suppose that (1) holds and II represents D. In 

order to show that the ddp D may be strongly-represented by an 

sdp II, we must show that II represents D (which we have assumed), 

(2) and (3) hold, and that all optimal policies may be obtained 

from the tables defined by (4). First, (2) holds by definition 

of G. Let H(q,p) denote the right hand side of (3). We will 

show that G (q,p)=H (q,p) . Suppose that y a is an optimal policy 

reaching state q, so G (q , p) =g (ya , p) . Since (1) holds we then 

have G (q , p) =g (y , p) where t(q,a)=q. Thus G(q,p) = h (g (y , p) , q , a , p) 

= h (G (q ,p) ,q, a ,p) > min h (G (q ' ,p) ,q ' , a ,p) =H(q,p), 

-{ (q' , a ) I t (q ,a)=q} 

or G (q,p)>H (q,p) . 

Now let H (q , p) =h (G (q , p) ,q , a , p) for some qGQ and suppose 
G (q , p) =g (y , p) where t(y)=q. i.e., y is an optimal policy reach- 
ing q. Let t (ya ) =q then G (q , p) <g (ya , p) = h (g (y,p) ,q,a,p) = 

h (G (q ,p) ,q , a ,p) = H(q,p), thus G (q, p) <H (q , p) . Combining these 

results we have G (q , p) =H (q , p) and (3) holds. 

By lemma 1 all policies in 0(q,p) are optimal with respect 
to h. Suppose though that not all optimal policies can be 
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obtained from (4). Let xa be an optimal policy of shortest 

length reaching state q which is not in 9(t(xa),p). Let t(x)=q'. 

By (1) x is optimal thus x69(q',p) (since x has shorter length 

than xa) and G (q ' , p) =g (x , p) . Since xa ■$? 0(t(xa),p) we must have 

G ( t (xa ) , p) < h(G(q',p),q',a,p) = h (g (x , p) , q ' , a ,p) = g(xa,p), but 

this contradicts our assumption that xa is an optimal sequence 

reaching state q. Therefore (q ' , a) 6T (q, p) and by definition 

xa60(q,p), so 0(q,p) is the set of all optimal sequences reaching 

state q. In particular U 0(q,p) = 0(IT/P). 

q6Q f 

(only if): Suppose now that the ddp D is strongly- 

repr esentable by the sdp IT. For some qGQ, xGA* we are able to 
obtain all optimal policies reaching state q using (2), (3), and 

(4). consider xae©(q,p) where t(xa)=q, t(x)=q'. By lemma 1 xa 
is an optimal sequence reaching state q. By definition 

x60(q',p), and by lemma 1 x is an optimal policy reaching q', so 
G (q ' , p) =g (x , p) . Thus (1) holds. IT represents D by assumption. 
QED 



Corollary 1_. A ddp D=(A,S,P,f) is strongly-represented by a sdp 
II=(A,Q,q 0 ,Q f ,T,h,k,P) iff IT represents 0 and IT is a s'-msdp. 

proof: immediate from theorems 2 and 3. 

The s '-monotonicity of the cost function of an sdp is an 
essential ingredient in a strong representation of a ddp. It can 
be shown however that any s'-monotonic cost function is effec- 
tively equivalent to some str ictly-monotonic cost function. 
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Given a s ' -monotonic function h, define the function g' (and 
thereby h' implicitly) as follows: 



Cq ( xa , p ) 



if G (q , p) =g (xa ,p) 



g' (xa ,p) 



(5) 



G (q,p)+g ' (X,p) 



otherwise . 



Define G ' (q,p) 



min 

{q | t (x) =q} 



g'(x,p). Note that by definition 



G (q , p) =G ' (q , p) for all states q and inputs p. Lemma 4 given in 
the appendix establishes the effective equivalence of h and h' in 
the sense that the set of optimal sequences obtained for each 
state is the same for both cost functions. 

Lemma 2* If h is s'-monotonic then h' defined by (5) is strictly 
monotonic . 

proof: Let h' be defined from the s'-monotonic function h by (5). 
Suppose for x,y6A* such that t(x)=t(y), we have g ' (x , p) <g ' (y , p) . 
We have 2 cases to consider in order to show that 
g ' (xa ,p) <g ' (ya ,p) . Let aCA such that t(xa)=q. Case 1: ya is not 
optimal. By construction of g', g ' (ya ,p) =G (q , p) +g ' (y , p) and 
g'(xa,p) has the value G(q,p) or G (q , p) +g ' (x , p) either of which 
is strictly less than g'(ya,p). Case 2: ya is an optimal 
sequence reaching state q. If ya is optimal then 
g ' (ya ,p) =g (ya ,p) =G (q , p) . Also by theorem 2, (1) holds so y is an 

optimal sequence; i.e., g'(y,p) = g(y,p) = G(q',p) = G'(q',p), 

but this contradicts our assumption that g ' ( x , p) <g ' (y , p) = 
G 1 (q ' , p) . QED 
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Theorem £. A ddp D=(A,S,P,f) is strongly represented by a sdp 



II=(A,Q,q 0 ,Q f ,T,h,k,P) iff there is a strictly monotonic sdp 
n'=(A,Q,q 0 ,Q f ,T,h' ,k,P) which strongly represents D. 

proof: (only if): Clearly any s-msdp is an s'-msdp so by corol- 

lary 1 the statement of the theorem is consistent and D is 
strongly represented by IT • 

(if): Suppose that D is strongly represented by 

II=(A,Q,q 0 ,Q f ,T,h,k,P), then by corollary 1 h is a s ’ -mono tonic 
cost function. Consider h' defined by (5) which is s-monotonic 
by lemma 3. We need to show that TT ' = (A , Q ,qg ,Qf , T ,h 1 ,k , P ) 
strongly represents D. (2) holds by definition. In order to 
show that (3) holds, let xa be an optimal sequence reaching state 
q. By construction G (q , p) =G ' (q , p) for all states qGQ. Equation 
(3) then holds for G' since it holds for G by corollary 1. Equa- 
tion (4) holds since lemma 4, given in the appendix, shows that 
9 ' (q , p) =9 (q , p) so 9'(q»p) is the set of all optimal sequences 

reaching state q. Finally TT * represents D since F (TP ) = F (ID = S and 

0<D,p)=0(n,p) = U 9 (q , p) = U 9 ' (q , p) = 0(II',P). QED 

q9Q f qGQ f 

5 . Weak representations of a discrete decisio n problem . 

We have been looking at the conditions under which we can 
find all optimal decision sequences reaching any state from qQ. 
In practice we may relax this requirement and be satisfied with 
some (or just one) optimal sequences to each state in Q. We now 
explore the conditions under which this requirement can be satis- 
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f ied . 



We have seen how a direct translation of the principle of 
optimality helped to establish the conditions for its applica- 
tion. In the more general situation faced now it may be helpful 
to give a generalized principle of optimality which applies when 
we are interested in obtaining only some optimal decision 
sequences . 

Generalized principle of optimality ( forward version ) : If there 
is an optimal sequence reaching state q, then there is an optimal 
sequence reaching state q with the property that no matter what 
the last decision and last state q' were, the sequence reaching 
q' is an optimal sequence. 

Given pGP, a sequence xa is 1- optimal if G ( t ( xa ) , p) =g (xa , p) and 
G (t (x) ,p)=g (x,p) . This generalized principle of optimality can 
be formalized as follows: 

( VpGP) (VqGQ) there is a 1-optimal sequence reaching state q 

In these terms we can reformulate the (original) principle of 
optimality as follows: VpGP VqGQ every optimal sequence reaching 
state q is 1-optimal. Condition (6) can be expressed soley in 
terms of the cost function h as given below in theorem 5. h is 
b- monotonic if for all qGQ, some optimal sequence xa reaching q, 
and sequence yaGA* reaching q, we have g (xa , p) <g (ya , p) -» 
g (x,p) <g (y,p) . A sdp TI= (A , Q ,q 0 , Q f , T ,h , k , P) in which h is b- 
monotonic is a b- monotonic sequential decision process (b-mdsp) . 
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Theorem 5. (6) holds iff h is b-monotonic. 



proof: (if): Consider an arbitrary state qGQ and let h be b- 

monotonic. We will show there exists a 1-optimal sequence reach- 
ing state q. Let xa be an optimal sequence reaching state q. 
Let P(q) denote the set of sequences such that yGP(q) iff t(y)=q. 
Partition P(t(x)) into two sets as follows: let 

Y ( x , a ) = {y lyGP (t (x) ) , g (xa , p) =g (ya , p) , g ( x , p) >g (y , p) } 
Z(x,a) = { z I zGP (t (x) ) , g (xa ,p) <g (za , d) } U 

{z I zGP (t (x) ) , g (xa ,p) =g (za ,p) , g (x , p) <g ( z , p) } 

For any zGZ(x,a) we have g ( x , p) <q ( z , p) , either by the monotoni- 
city of h in the case that g ( xa , p) <g (za , p) or by definition in 
the other case. Thus if Y(x,a) is empty then G ( t ( x ) , p) =g ( x , p) 
and xa is a 1-optimal sequence reaching state q. On the other 

hand if Y(x,a) is nonempty, we have y'= min g(y,p) for some 

yGY (x,a) 

y'GY(x,a). Then g (y ' , p) <q (y , p) for all yGY(x,a), and 
g (y ' ,p) <g (x,p)<g (z,p) for all zGZ(x,a), thus G ( t (x ) , p) =g (y ' , p) . 
But g (y ' a , p) =g (xa , p) =G (q , p) , so y'a is a 1-optimal sequence 
reaching state q. 

(only if): Suppose now that (^) holds. For an arbitrary 
state q, let G (q , p) =g (xa , p) and G (q ' , p) =g (x , p) where t(q',a)=q 
and t(x)=q'; i.e., xa is 1-optimal sequence reaching state q. 
Suppose that h is not b-monotonic, so for some sequence ya we 
have g (xa , p) <g (ya , p) and g (x , p) >cj (y , p) . By the 1-optimality of 
xa we have g (x , p) =G (q ' , p) <q (y , p) . Furthermore we must have 
g (x,p) <g (y,p) since g (x , p) =g (y , p) -» h (g (x , p) , t (x ) , a , p) 
h (g (y , p) , t ( x ) , a , p) ; i.e., g ( xa , p) =g (ya , p) . This contradiction 
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shows that h is b-monotonic. QED 



Theorem 6. A ddp D«(A,S,P,f) is weakly-represented by a sdp 
11= (A,Q,q 0 ,Q f ,T,h,k,P) iff II represents D and (6) holds. 

proof j (if): Suppose that the ddp D=(A,S,P,f) is weakly- 

represented by a sdp 11= (A ,Q ,q 0 ,Q f ,T ,h , k , P) . By definition IT 

represents D. Now let q be an arbitrary state. By (2), 

G (q , p) = min h (G (q ' , p) ,q ' , a , p) . Let G(q,p) 

{ (q * ,a) 1 1 (q ' ,a)=q) 

h (G (3,p) ,q,a,p) and let G (q , p) =g (y , p) , then G(q,p) 

h (G (q , p) ,^,a,p) = h (g (y , p) ,§ , a ,p) = g(ya,p). We have just shown 

that ya is a 1-optimal sequence reaching state q. Thus (8) 
holds . 

(only if): Suppose now that IT represents D and (6) holds. 
For any state qGQ, there exists a sequence xa such that t(xa)=q, 
G (q,p) =g (xa ,p) , and G (§ ,p) =g (x , p) . G(q,p) = g(xa,p) 
h (g (x , p) ,q , a ,p) = h (G (q , p ) ,4 / a /P) which implies that we can find 

the value G(q,p) by minimizing the expression h (G (q ' ,p) ,q ' , a ,p) 
over all q'SQ, aeA such that t(q',a)=q, and thus we get (3). (2) 

follows by definition. By definition all elements of 9(q,p) are 
optimal sequences which reach state q. To see that 6(q,p) is 
nonempty, note that since (6) holds there is a sequence xa such 
that G (q,p)=g (xa ,p) and G (q ' , p) =g (x , p) where T(q',p)=q and by 
definition such an xa is in 0(q,p). Finally IT represents D by 
assumption. QED 
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Corollary 2 . A ddp D=(A,S,P,f) is weakly-representable by a sdp 
II=(A,Q,q 0 ,Q f ,T,h,k,P) iff n represents D and IT is a b-msdp. 

proof: immediate from theorems 5 and 6. 

We have now characterized the classes of sdp's which weakly 
and strongly represent ddp's. The difference between these two 
types of representations is illustrated in figure 2. Here h is 
b-monotonic but h is not s ' -monotoni c . According to equation 
(3), in order to determine an optimal sequence reaching q, we 
consider an extension of an optimal sequence reaching q'. But in 
restricting the search to optimal sequences reaching q', equation 
(3) overlooks the optimal sequence ya reaching q. This illus- 
trates why b-msdp's can only weak ly-represent a ddp. 

The conditions established for the weak-representation of a 
ddp are necessary in order to take care of fairly pathological 
cost functions. It can be shown however that the cost function 
of any sdp which weakly represents is equivalent to other cost 



g (x , p) =10 g (xa ,p)=16 
g (y,p)=12 g (ya ,p)=16 




Figure 2. 
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functions with nicer properties. Given a cost function h which 
is b-monotonic, define the function g' (and thereby h') as fol- 
lows : 

f"g(x,p) if x is 1-optimal 

g ' ( x , p) = 1 (7) 

IG(t(x),p)+l otherwise. 

Define G'(q,p)= min g'(x,p). Lemma 4 given in the appendix 

t (x)=q 

establishes the effective equivalence of h and h' in the sense 
that the set of optimal sequences obtained for each state is the 
same for both cost functions. 

•ft 

h is monotonic if Vx,yGA VaGA such that t(x)=t(y) 

g (x,p)<g (y,p) -> g (xa ,p) <g (ya ,p) . An sdp with cost function h 

which is monotonic is a monotonic sequential decision process 
(m- sdp ) . 

Lemma If for some sdp 11= (A , Q , qQ , Q ^ ,T , h , k , P) h is b-monotonic 

then h' defined by (7) is monotonic. 

proof: Consider the function h' defined in (7). h' can be shown 
to be monotonic as follows. Let t (x) =t (y) =q ' , t(q',a)=q and 
g ' (x,p)£g ' (y ,p) . If xa is 1-optimal then g ' (xa ,p) =g (xa , p) =G (q , p) 
and since g'(ya/P) has the value G(q,p) or G(q,p)+1, 
g ' ( xa , p) <g ' (ya , p) . Suppose now that ya is 1-optimal, then 
G (q,p) =g (ya ,p) and G (q ' ,p) =g (y , p) , g ' (ya ,p) =g (ya ,p) and 
g ' (y ,P) =g (y ,p) =G (q ' ,p) =g ' (x ,p) (since g ' (x , p) £g ' (y , p) . But if 
g ' (x,p) =g ' (y ,p) then g'(xa,p) = h ' (g ' (x , p) ,q ' , a , p) 
h ' (g ' (ya,p) ,q ' ,a,p) = g'(y,p) (thus g ' (xa ,p) £g ' (ya , p) ) . If 
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neither xa nor ya is 1-optimal then g ' (xa ,p) =g ' (ya ,p) =G (q ,p) +1 . 
In all cases the monotonicity of h' is shown. QED 

The following result is well known (Elmaghraby 1970, Bonzon 
1970) in the sense that dynamic programs are in one to one 
correspondence with monotonic sdp's. However to the author's 
knowledge it has not been pointed out that m-sdp's can only 
weakly represent a ddp; i.e., one is not guaranteed to be able to 
obtain all optimal solutions from a representation by a m-sdp. 

Theorem 1 _. A ddp D=(A,S,P,f) is weakly-represented by some sdp 
n=(A,Q,q 0 ,Q f ,T,h,k,P) iff there is a m-sdp 
II ' = (A ,Q ,q 0 ,Q f , T ,h ' , k , P) which weak ly-represents D. 

proof: (if): We must show that a m-sdp can represent D. Let xa 
be an optimal sequence reaching q, so G (q , p) =g (xa , p) . Suppose 
g (xa ,p) <g (ya ,p) yet g (x , p) >_g (y , p) . By the monotonicity of h', we 
get g (xa ,p) >g (ya ,p) which contradicts our assumption. Thus 
g (x , p) >g (y , p) and h' is b-monotonic. By corollary 2, H' weakly- 
represents D. 

(only if): Suppose that D is weak ly-represented by an sdp 
II=(A,Q,q 0 ,Q f ,T,h,k,P) , and h’ is defined by (7) from h, then by 
corollary 2, h is b-monotonic and by lemma 5 h' is monotonic. 

We can show that D is weakly-represented by the sdp 
n ,= (A,Q,q 0 ,Q f ,T,h' ,k,P) . (2) holds by definition. Let xGA* be a 
1-optimal sequence reaching state q6Q so G (q , p) =g (x , p) . Such a 
sequence exists by theorem 6. By construction g ' (x , p) =g (x , p) so 
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G ' (q,p)*G (q,p) for all states qGQ. Equation (3) must hold for 

G'(q,p) since it holds for G(q,p) as a result of corollary 2. 

Lemma 4 shows that 0 (q , p) =0 ' (q , p) so 0'(q,p) is a nonempty subset 

of optimal sequences. Finally 11' represents D since F (IT * ) =F (II) =S 

and 0(n', P ) = U 0'(q,p) = U 0(q,p) = 0(IT,p) C 0(D,p). QED 
q6Q f qSQ f 



6. Conclusion. 



This paper has given necessary and sufficient conditions for 
the strong and weak representation of a discrete decision problem 
by a sequential decision process. Strictly monotonic (monotonic) 
sequential decision processs have been shown to be equivalent in 
the strong (weak) representation sense to the class of discrete 
decision problems which can be formulated as discrete dynamic 
programs. We have shown that the problems to which the principle 
of optimality applies are a subclass of the problems to which the 
functional equations of dynamic programming are applicable. 

Appendix 

In order to establish lemma 1 we will need the following defini- 
tion and lemma. We say x6A* is completely - optimal if every ini- 
tial segment (every y6A* such that there exists zGA* such that 
yz=x) y of x is 1-optimal. 

Lemma 2 . xaG0(q,p) iff xa is completely optimal. 

proof: by induction on the length of a sequence. Let the length 
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of x be 1, i.e., x€A. (q g , x ) €0 (q , p) iff x€T(q,p) and eG9(q,p) 

where e is the empty sequence and t(x)=q. By definition 
eG0(q s ,p) and x69(q,p) iff G (q , p) =g (x , p) iff x is an optimal 
sequence . 

Induction step: Assume that the lemma holds for any 

sequence of length <m and let the length of the sequence xa be m. 
xaG9(q,p) iff (q ' , p) GT (q , p) and xG0(q',p) where T(q',p)=q. By 
induction hypothesis xG9(q',p) iff x is completely optimal. This 
implies that G (q ' ,p) =g (x , p) . Also (q ' , p) GT (q ,p) iff 

G (q,p)=h(G (q' ,p) ,q' ,a,p) = h (g ( x , p) ,q ' , a ,p) =g ( xa , p) . (xa is 1- 

optimal and x is completely optimal -»xa is completely optimal), 
i.e., xa is completely optimal. QED 

The following lemma establishes the effective equivalence of h 
and h' defined by (5) in the sense that the set of optimal 
sequences obtained for each state is the same for both cost func- 
tions. The lemma also holds true for h' defined by equation (7). 

Lemma 4^. VqGO , VpGP 9(q,p)=9' (q,p) . 

proof: xG9(q,p) iff x is completely optimal (by lemma 2), 
iff x=a i a 2'** a n and a^*--a- is 1-optimal with respect to h for 
x — 1,... , n 

iff g ' (aj ,p) =g (aj ,p) =G ' ( t (aj ) ,p) and ... and g ' (a ± . . . a p , p) 
g(aj...a ,p) = G ( t (a • • • a n ) ,p) by construction, 
iff x is completely optimal with respect to h', 
iff xG9'(q,p) (by lemma 2). 

QED 
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